Respective contribution of baseline clinical data, tumour metabolism and tumour blood-flow in predicting pCR after neoadjuvant chemotherapy in HER2 and Triple Negative breast cancer

Background: The aim of this study is to investigate the added value of combining tumour blood flow (BF) and metabolism parameters, including texture features, with clinical parameters to predict, at baseline, the pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) in patients with newly diagnosed breast cancer (BC). Methods: One hundred and twenty-eight BC patients underwent a 18F-FDG PET/CT before any treatment. Tumour BF and metabolism parameters were extracted from first-pass dynamic and delayed PET images, respectively. Standard and texture features were extracted from BF and metabolic images. Prediction of pCR was performed using logistic regression, random forest and support vector classification algorithms. Models were built using clinical (C), clinical and metabolic (C+M) and clinical, metabolic and tumour BF (C+M+BF) information combined. Algorithms were trained on 80% of the dataset and tested on the remaining 20%. Univariate and multivariate features selections were carried out on the training dataset. A total of 50 shuffle splits were performed. The analysis was carried out on the whole dataset (HER2 and Triple Negative (TN)), and separately in HER2 (N=76) and TN (N=52) tumours. Results: In the whole dataset, the highest classification performances were observed for C+M models, significantly (p-value<0.01) higher than C models and better than C+M+BF models (mean balanced accuracy of 0.66, 0.61, and 0.64 respectively). For HER2 tumours, equal performances were noted for C and C+M models, with performances higher than C+M+BF models (mean balanced accuracy of 0.64, and 0.61 respectively). Regarding TN tumours, the best classification results were reported for C+M models, with better performances than C and C+M+BF models but not significantly (mean balanced accuracy of 0.65, 0.63, and 0.62 respectively). Conclusion: Baseline clinical data combined with global and texture tumour metabolism parameters assessed by 18F-FDG PET/CT provide a better prediction of pCR after NAC in patients with BC compared to clinical parameters alone for TN, and HER2 and TN tumours together. In contrast, adding BF parameters to the models did not improve prediction, regardless of the tumour subgroup analysed. Supplementary Information The online version contains supplementary material available at 10.1186/s13550-024-01115-4.


Clinical and histopathological data
The age of the patients was reported as well as their menopausal status (postmenopausal or not) and whether the tumours were pluri-focal or not.From core biopsies, histological type and tumour grade, using the Scarff-Bloom-Richardson modified by Elston and Ellis (SBR) system, were included.The SBR grade and mitotic SBR class were classified as high grade (grade 3) vs. low grade (grade 1 and 2).From patients' TNM grading, the tumour (T1 and T2 vs T3 and T4) and lymph node (N0 vs N1-3) status were reported.The hormone receptor status (PR and ER) and the type of treatment were also included in the data.

Machine learning pipelines 4.1 Features Selection
Three types of classification algorithms have been tested: Logistic Regression (LR), Support Vector Classification (SVC) and Random Forest (RF) classifiers, each associated with several feature selection methods.Univariate feature selections using two-sided Mann-Whitney tests or univariate LR were implemented, considering several p-values as significance criteria: 0.10, 0.15 and 0.20.A wrapper feature selection using Decision Trees (DT) was also carried out.In addition, an embedded feature selection method was implemented, using the least absolute shrinkage and selection operator (LASSO) penalisation, when associated with the LR classifier.Finally, since RF classifiers are known to be robust against correlated features, no feature selection was investigated in addition to the above-mentioned feature selection methods, when associated with this classification algorithm.Spearman correlations between all the features were also calculated either before the wrapper and embedded feature selections or after the univariate feature selection since the latter considers the variables independently.To reduce multicollinearity, all the highly correlated features were removed using a cut off value of ρ > 0.8.To prevent any bias, all the feature selections were performed using data from the training cohort only.

Models' developments
Models' hyperparameters were optimised via a grid-search on the training dataset, using an inner 4-fold cross validation (CV).A decision threshold of 0.5 was used for the hard classification performance metrics.Class balancing was investigated using weighted classes or by applying data augmentation using the Syntactic Minority Oversampling Technique (SMOTE) with over-samplings.All the optimal models were selected using a balanced accuracy.
Biological correlates of tumor perfusion and its heterogeneity in newly diagnosed breast cancer using dynamic first-pass F-FDG PET/CT.Eur J Nucl Med Mol Imaging.2020 May;47(5):1103-1115.  5 Results using a fixed number of bins (RR)

HER2 and TN
The highest performances were noted for C+M+BF models (mean BAcc=0.64)using a LR classifier with LASSO feature selection, enhanced by data augmentation using SMOTE.Close performances were observed for C+M models, with the highest mean balanced accuracy equal to 0.63 when using LR classifier with LASSO feature selection, on weighted data, as reported in Figure S4.
All the performances of these best models are reported in Table S1.All the models studied presented better classification performances than dummies' models.The corresponding features selected in more than 50% of the splits are listed in Table S2.

HER2
Considering HER2 tumours independently, the highest test performances were observed for C+M+BF models, with a mean balanced accuracy of 0.66.Clinical and C+M models presented equal performances for their best models with a mean balanced accuracy equal to 0.64.All these models used a LR classifier with LASSO feature selection, on weighted data as shown in Figure S5.The performances of these best models are reported in Table S3.All the models studied presented better classification performances than dummies' models.The corresponding features selected in more than 50% of the splits are listed in Table S4.

TN
When only TN tumours were analysed, the highest performances (mean Bacc=0.65)were noted for C+M models using LR classifier combined with a MW feature selection, and enhanced by data augmentation using SMOTE.When BF information was  All the performances of these best models are reported in Table S5.All the models studied presented better classification performances than dummies' models.The corresponding features selected in more than 50% of the splits are listed in Table S6.

Fig. S4
Fig. S4 Summary of the mean balanced accuracy over the 50 shuffle splits for the test (n=26) and training (n=102) sets, for each subgroup (C, C+M and C+M+BF) among the HER2 and TN tumours combined.The best overall performance is highlighted in red; the other two highest results are highlighted in blue.

Fig. S5
Fig. S5 Summary of the mean balanced accuracy over the 50 shuffle splits for the test (n=15) and training (n=61) sets, for each subgroup (C, C+M and C+M+BF) among the HER2 tumours.The best overall performance is highlighted in red; the other two highest results are highlighted in blue.

Fig. S6
Fig. S6 Summary of the mean balanced accuracy over the 50 shuffle splits for the test (n=10) and training (n=42) sets, for each subgroup (C, C+M and C+M+BF) among the TN tumours.The best overall performance is highlighted in red; the other two highest results are highlighted in blue.

Table S1
Averages over the 50 shuffle splits of the best test and training model performances for each subgroup (C, C+M and C+M+BF) for the HER2 and TN tumours combined.

Table S2
Features selected in more than 50% of the shuffle splits of the best models for each subgroup (C, C+M and C+M+BF) when analysing HER2 and TN tumours combined.

Table S3
Averages over the 50 shuffle splits of the best test and training model performances for each subgroup (C, C+M and C+M+BF) for the HER2 tumours.BAcc: Balanced Accuracy, MCC: Matthews Correlation Coefficient, AUC: Area Under the ROC Curve

Table S4
Features selected in more than 50% of the shuffle splits of the best models for each subgroup (C, C+M and C+M+BF) when analysing HER2 tumours.

Table S5
Averages over the 50 shuffle splits of the best test and training model performances for each subgroup (C, C+M and C+M+BF) for the TN tumours.BAcc: Balanced Accuracy, MCC: Matthews Correlation Coefficient, AUC: Area Under the ROC Curve TableS6Features selected in more than 50% of the shuffle splits of the best models for each subgroup (C, C+M and C+M+BF) when analysing TN tumours.